{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pandas-针对时间序列的操作"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "内容介绍:针对山东大学《Python大数据分析(2020春)》课程中时间序列的笔记<br>\n",
    "地址见：https://www.bilibili.com/video/BV115411Y7UX?p=57"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from datetime import datetime"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.将时间序列作为索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2018-10-02   -0.563316\n",
       "2018-10-05   -0.114550\n",
       "2018-10-07    1.176024\n",
       "2018-10-08    0.173166\n",
       "2018-10-10   -0.127696\n",
       "2018-10-12    0.943024\n",
       "2017-10-03   -1.005949\n",
       "2017-10-12   -0.682921\n",
       "2017-09-10   -2.124341\n",
       "2017-09-12    0.789678\n",
       "2017-08-10    0.425038\n",
       "2017-08-12    0.997095\n",
       "2017-10-09   -0.701031\n",
       "2017-10-08    1.601193\n",
       "dtype: float64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 示例数据\n",
    "dates = [\n",
    "    datetime(2018,10,2),datetime(2018,10,5),\n",
    "    datetime(2018,10,7),datetime(2018,10,8),\n",
    "    datetime(2018,10,10),datetime(2018,10,12),\n",
    "    datetime(2017,10,3),datetime(2017,10,12),\n",
    "    datetime(2017,9,10),datetime(2017,9,12),\n",
    "    datetime(2017,8,10),datetime(2017,8,12),\n",
    "    datetime(2017,10,9),datetime(2017,10,8),\n",
    "]\n",
    "df = pd.Series(np.random.randn(14),index=dates)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DatetimeIndex(['2018-10-02', '2018-10-05', '2018-10-07', '2018-10-08',\n",
       "               '2018-10-10', '2018-10-12', '2017-10-03', '2017-10-12',\n",
       "               '2017-09-10', '2017-09-12', '2017-08-10', '2017-08-12',\n",
       "               '2017-10-09', '2017-10-08'],\n",
       "              dtype='datetime64[ns]', freq=None)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#查看索引类型\n",
    "df.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.通过时间序列索引获取数值的方法\n",
    "\n",
    "* 传入的日期参数较为灵活，可以是日期字符串，日期数据和其他可以解释为日期的数值。\n",
    "* 由于pandas使用了第三方库dateutil包的parser.parse，可以自动解析人类可读日期字符串为日期数据，因此传入日期<br>\n",
    "字符串后会自动识别并获取数值（但需要注意这个包的缺陷，即将某些数字自动识别为日期）\n",
    "* 获取的数据为原时间序列的视图(类似于numpy数组)，在视图上的修改会反应在原始数据上。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('2018-10-02 00:00:00')"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#使用原始的序号索引，查看索引其中的一项\n",
    "df.index[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-1.345226911983526"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#使用可以理解的日期字符串索引获取数据。\n",
    "df['2018-10-2']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-1.345226911983526"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#使用日期型数据值索引获取值\n",
    "df[datetime(2018,10,2)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-1.345226911983526"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#使用另一种可以理解的日期字符串索引获取值\n",
    "df['2018/10/2']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2018-10-02    0.595156\n",
       "2018-10-05   -0.087237\n",
       "2018-10-07   -0.002306\n",
       "2018-10-08    1.741679\n",
       "2018-10-10   -0.802610\n",
       "2018-10-12   -2.002636\n",
       "dtype: float64"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#获取数据时，也可以按照年获取，或者按照月获取\n",
    "df['2018-10']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2017-10-03   -1.005949\n",
       "2017-10-12   -0.682921\n",
       "2017-09-10   -2.124341\n",
       "2017-09-12    0.789678\n",
       "2017-08-10    0.425038\n",
       "2017-08-12    0.997095\n",
       "2017-10-09   -0.701031\n",
       "2017-10-08    1.601193\n",
       "dtype: float64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#按整年获取数据\n",
    "df['2017']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "<ipython-input-9-687a40a5f217>:2: FutureWarning: Value based partial slicing on non-monotonic DatetimeIndexes with non-existing keys is deprecated and will raise a KeyError in a future Version.\n",
      "  df['2017-8-15':'2017-10-5']\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "2017-10-03   -1.005949\n",
       "2017-09-10   -2.124341\n",
       "2017-09-12    0.789678\n",
       "dtype: float64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#由于时间序列数据大部分是按照时间排序，因此可以使用不在索引中的时间戳切片\n",
    "df['2017-8-15':'2017-10-5']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2017-08-10    0.425038\n",
       "2017-08-12    0.997095\n",
       "2017-09-10   -2.124341\n",
       "dtype: float64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#另一种切片的等价函数，但需要数据排序\n",
    "df2 = df.sort_index()\n",
    "df2.truncate(after='2017-9-11')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2017-01-01   -0.893246\n",
       "2017-01-02    1.164275\n",
       "2017-01-03    1.404720\n",
       "2017-01-04   -0.697227\n",
       "2017-01-05   -0.349487\n",
       "                ...   \n",
       "2019-09-23   -0.454415\n",
       "2019-09-24    1.022317\n",
       "2019-09-25    0.282127\n",
       "2019-09-26   -0.730919\n",
       "2019-09-27    0.270407\n",
       "Freq: D, Length: 1000, dtype: float64"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#创建较长数据\n",
    "df_long = pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2017',periods=1000))\n",
    "df_long"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2017-05-04   -0.556910\n",
       "2017-05-05    0.558927\n",
       "2017-05-06    0.184635\n",
       "2017-05-07   -0.390476\n",
       "2017-05-08    0.237322\n",
       "Freq: D, dtype: float64"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#使用时间切片的方法获取数据\n",
    "df_long['2017-5-4':'2017-5-8']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.带有重复索引的时间序列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2018-10-02    10\n",
       "2018-10-05    10\n",
       "2018-10-07     3\n",
       "2018-10-08     9\n",
       "2018-10-10     5\n",
       "2018-10-12     1\n",
       "2017-10-03     4\n",
       "2017-10-12     3\n",
       "2017-09-10     3\n",
       "2017-09-12     5\n",
       "2017-08-10    12\n",
       "2017-08-12     0\n",
       "2017-10-09    14\n",
       "2017-10-08     7\n",
       "2018-10-02     2\n",
       "2018-10-05     1\n",
       "2018-10-07     1\n",
       "2018-10-08     5\n",
       "2018-10-10    10\n",
       "2018-10-12     2\n",
       "2017-10-03    10\n",
       "2017-10-12     5\n",
       "2017-09-10     6\n",
       "2017-09-12     6\n",
       "2017-08-10     3\n",
       "2017-08-12     5\n",
       "2017-10-09     3\n",
       "2017-10-08     9\n",
       "dtype: int64"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 示例数据\n",
    "dates = [\n",
    "    datetime(2018,10,2),datetime(2018,10,5),\n",
    "    datetime(2018,10,7),datetime(2018,10,8),\n",
    "    datetime(2018,10,10),datetime(2018,10,12),\n",
    "    datetime(2017,10,3),datetime(2017,10,12),\n",
    "    datetime(2017,9,10),datetime(2017,9,12),\n",
    "    datetime(2017,8,10),datetime(2017,8,12),\n",
    "    datetime(2017,10,9),datetime(2017,10,8),\n",
    "]\n",
    "df3 = pd.Series(np.random.randint(0,15,size=(28)),index=dates*2)\n",
    "df3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#检查是否存在重复索引\n",
    "df3.index.is_unique"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2017-08-10    2\n",
       "2017-08-12    2\n",
       "2017-09-10    2\n",
       "2017-09-12    2\n",
       "2017-10-03    2\n",
       "2017-10-08    2\n",
       "2017-10-09    2\n",
       "2017-10-12    2\n",
       "2018-10-02    2\n",
       "2018-10-05    2\n",
       "2018-10-07    2\n",
       "2018-10-08    2\n",
       "2018-10-10    2\n",
       "2018-10-12    2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#使用索引聚合,level=0\n",
    "df3.groupby(level=0).count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2017-08-10    15\n",
       "2017-08-12     5\n",
       "2017-09-10     9\n",
       "2017-09-12    11\n",
       "2017-10-03    14\n",
       "2017-10-08    16\n",
       "2017-10-09    17\n",
       "2017-10-12     8\n",
       "2018-10-02    12\n",
       "2018-10-05    11\n",
       "2018-10-07     4\n",
       "2018-10-08    14\n",
       "2018-10-10    15\n",
       "2018-10-12     3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#使用索引聚合,level=0\n",
    "df3.groupby(level=0).sum()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
